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Abstract 


The close relation of signal de-noising and regression problems deal- 
ing with the estimation of functions reflecting dependency between 
a set of inputs and dependent outputs corrupted with some level 
of noise have been employed in our approach. In signal process- 
ing desired functions (signals) are usually assumed to be a linear 
combination of the basis functions <^(x); i.e. : 

/(x) = £ w d <t>d(x) + w 0 . (1) 

d— 1 

With respect to this signal de-noising formulation our method 
consist from the following steps used with the aim of the function 
/(x) recovery from the original noisy signal measurement: 

• inputs (x) are equidistantly sampled points in input space; in 
1-D we pre-define sampling interval to be [—1, 1] and the num- 
ber of sampling points then depends on the selected sampling 
rate. This allows us to find optimal or near optimal parameters 
for the kernel mapping (or even particular kernel mapping) for 
different classes of signals under investigation. 

• the basis function <^(x) are taken to be components obtain by 
kernel PLS, which may be seen as the estimates of orthogonal 
basis in a feature space defined by kernel function used. These 
estimates are sequentially obtained using the existing correla- 
tions between nonlinearly mapped input data and the measured 
noisy signal [1]. 



• to set the number of basis functions D we have used the VC- 
based model selection criterion described in [2,3,4]. The order- 
ing of the basis functions for the purposes of the used model 
selection criterion is defined by their sequential extraction. 

• using the locally based kernel PLS allows us to deal with a pos- 
sible discontinuity and non-stationarity in the signal of inter- 
est. Locality is achieved using modified kernel PLS algorithm 
incorporating the weight functions reflecting the local areas of 
interest. Depending on weight function selection this allows 
us to construct soft or hard thresholding regions where kernel 
PLS regression models are constructed. Final estimate consist 
of composition of individual local kernel PLS regression models. 

• we compared our methodology with the state-of-the-art wavelet 
based signal de-noising and smoothing splines approaches on 
heavisine and artificially generated event-related potentials dis- 
tributed over individual scalp areas. Different levels of additive 
white and colored noises with respect to clean signals were used. 



Methods 


Kernel PLS regression 

• linear PLS regression in feature space T 

• decomposition: X = TP t + E ; Y = UC T + F 

• latent variables (scores): 

K i - 1 

L == (x* Yl^bVbk) 

k = 1 6=1 

= /(ti) + hp - inner relation in PLS model; 
tp - vector of residuals 

• NIPALS algorithm applied to PLS finds weights w, c such 
that 

[cou(t,u)] 2 = [cou(Xw, Yc)] 2 = max| r | = | s | =1 [cou(Xr, Ys)] 

• nonlinear (kernel) variant [1]: 

XX T YY T t - KYY T t = At 
u = YY T t 

or iterative kernel-based NIPALS algorithm 

• sequential extraction t, u =>■ T, U 

• deflation of K and Y matrices after each step 

• final regression model: 

Y = XB = KU(T t KU) _ 1 T t Y = TT t Y = TB 

assuming y € 7Z 

y(x) = Mi(x) + 62^2 (x) + . . . + ; b = T^y 



Locally based kernel PLS regression 

• soft clustering : r - vector of weights 

r s = Y, r ; = diag( r) ; J = ones(n , 1) ; I = eye(n ) 

r^X 

X r = R^(X — J ) ; mean(^ r ) = 0 

To 



• kernel variant: 


Jr : 


K r = X r X 2 r = R d (I )K(I - 
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Y r = R d (Y - J 

• Gaussian kernel: 

l|xi -X 2 || 2 = (x^Xj) - 2(xi,x 2 > + (x 2 ,x 2 ) 
IIV'(xi) - V'(x 2 )|| 2 = 2 - 2exp(- 
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• local kernel PLS regression 
m-th cluster defined by weight vector r m : 
sequential extraction of t, u from 
K™Y r m (Y r m ) r ; Y™(Y™) r =4> T m , U m 


ym = T m( T mY Yr m => Y m = Rj 1 Y™ + J 


r T Y 


final model [M - clusters): 


M „ M 

« = e *77 e 7 

m— 1 m=l 








VC-based complexity control 


• for regression problems with squared loss the following bound 
on prediction risk (PR) holds with probability 1 — rj [4] 


PR<-E(yi-y,) 2 *(i-c, 

Tl i = 1 


h(ln(f) + l) - In 77, 


h - VC dimension of the set of approximating functions 
c - constant reflecting the “tails of the loss function distri- 
bution” 

a - theoretical constant 
(x) + = u if x > 0 

0 otherwise 

Cherkassky et. al constructed empirical (heuristical) Vap- 
nik’s measure [2,3] to compute estimated risk (ER) 


1 JL 


Inn, 


ER = -E(vi~ mY * (1 - ,b - b In b + — ) 


n i=i 


b = (d + l)/n where d+1 represents VC dimension of the 
approximation function (1) with d terms 



Smoothing splines 

• min f (— t (w - f{x t )f + A C(f^(x)) 2 dx A > 0 

71 i — 1 

natural cubic splines with knots at ; i = 1 , . . . , n 

• f = Ns ; s = (N r N + AfJ) -1 N T y 

{N} ;i = N ; (a; t ) ;j,k= 1, . . 

• f = S A y -S- d/ A = trace(S\) 


squared loss 

1 n 1 
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n z=i 1 - (S A )u J ' ^ n i = i 1 - trace(S\)/n 
complete basis — > shrink the coefficients toward smoothing 


Wavelet smoothing 

• complete orthonormal basis — »• shrink and select the coef- 
ficients toward a sparse representation 

• wavelet basis is localized in time and frequency 

• y* = W T y ; discrete wavelet transform 

(i.e. full LS regression coefficient) 

W nX n orthonormal basis 

• SURE shrinkage : min r ||y - Wrlljjj + 2A||r||i 

fj = sign( y*)(\y* \ - A)+ 

?? A = cr^/21ogn 

• f = Wf inverse wavelet transform 



Data construction 


• heavisine function 

additive noise: white Gaussian 

• Event related potentials - N100,P300 
additive noise : white Gaussian, 

relax state spatially distributed EEG signal 




















Let’s cheat - a little bit !! 



Noise 

SNR ldB 

LKPLS Wave 

SNR 5dB 

LKPLS Wave 

SNR lOdB 

LKPLS Wave 

SNR 15dB 

LKPLS Wave 

128 

.22 

.33 

.18 

.22 

.13 

.14 

.10 

.10 

256 

.17 

.25 

.13 

.17 

.10 

.12 

.08 

.08 

512 

.12 

.19 

.10 

.14 

.09 

.10 

.07 

.07 

1024 

.09 

.15 

.09 

.11 

.08 

.08 

.07 

.05 

2048 


.12 


.09 

• 

.06 

- 

.04 


Table 2: Normalized root mean squared error. 
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• ERP - spontaneous EEG like noise 
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Discussion, future work 


• comparable results with existing state-of-the-art smoothing and 
de-noising techniques 

• the construction of the (locally based) kernel PLS regression 
basis allows to incorporate the prior knowledge about the signal 
of interest 

• input samplings dimensionality not crucial problem in (locally 
based) kernel PLS smoothing - e.g. images de-noising 

• multivariate (locally based) kernel PLS allows straightforward 
extension to higher dimensional smoothing problems plus the 
existing correlation among the signals determine the basis con- 
struction - e.g. spatio-temporal smoothing of EEG recordings 

• possibility to combine shrinkage and selection techniques or 
better model selection techniques ? 

• computational disadvantages of kernel based approaches can 
be compensated by “segmentation” in the case of locally based 
kernel PLS ? 

• smoothing real world biological signals - ERP, eye-blinks, etc. 
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